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Abstract 

In [Fortini et al., Stoch. Proc. Appl. 100 (2002), 147-165] it is demonstrated 
that a recurrent Markov exchangeable process in the sense of Diaconis and Freed- 
man is essentially a partially exchangeable process in the sense of de Finetti. In 
case of finite sequences there is not such an equivalence. We analyze both fi- 
nite partially exchangeable and finite Markov exchangeable binary sequences and 
formulate necessary and sufficient conditions for extendibility in both cases. 
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1 Introduction 

A finite sequence of r.v.s {Xi, . . . , Xn) defined on a common probability space is said 
exchangeable (sometimes n-exchangeable) if its joint distribution is invariant under 
permutations of its components. The sequence may or may not be the initial segment of 
a longer exchangeable sequence, i.e., as is said, it may or may not be "extendible", and 
is said cxi-extendible, if it is the initial segment of an infinite exchangeable sequence, 
de Finetti characterized all the n-exchangeable sequences of r.v.s taking values in a finite 
space /, disregarding their extendibility, as unique mixtures of certain n-exchangeable, 
not extendible distributions, namely the hypergeometric processes. From this result, 
he has been able to demonstrate his representation theorem for exchangeable infinite 
sequences by a passage to the limit, and in [5] derived necessary and sufficient conditions 
for extendibility of {0, l}-valued finite sequences in a geometric approach (see also [7], 
[in], and [531). 
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Under partial exchangeability, introduced in [3] (pp. 193-205 of [T7j), (^i,. . .,Xn) is 
divided into groups or subsequences (e.g. women and men) accordingly to a character- 
istic we consider relevant (e.g. each unit's sex), and we retain exchangeability to hold 
just for variables within the same subsequence. Again, we can represent every finite 
partially exchangeable sequence as a mixture of not extendible, partially exchangeable 
sequences, and an analogous representation theorem holds if all the exchangeable subse- 
quences forming it are oo-extendible. de Finetti in [3] (pp. 147-227 of [3]), suggested to 
consider in a sequence of observations the last observation preceding the present one as a 
relevant characteristic to define a an interesting case of partial exchangeability. Consider 
a finite state space /; call the variables immediately subsequent any occurrence oi i G I 
the successors of i. Then the subsequences forming the partially exchangeable sequence 
are those constituted of the successors of each state in /. He apparently suggested the 
possibility to characterize, by the usual passage to the limit, the mixtures of Markov 
Chains processes as partially exchangeable processes of that kind. 

Diaconis and Freedman in [5] demonstrated that the limit argument does not hold for 
mixtures of transient Markov Chains. They dropped the intuitive idea of "relevant char- 
acteristics" , introduce a different notion of partial exchangeability in terms of sufficient 
statistics (we will call this case Markov exchangeability) and characterized the mixtures 
of Markov Chains under the additional assumption of recurrence of the process. In jl3) 
it is demonstrated that the two definitions (that in terms of subsequences and that in 
terms of sufficient statistics) coincide in case of recurrent processes. But they differ in 
case of finite sequences. 

We will focus on partial exchangeability in the sense of de Finetti and Markov ex- 
changeability for finite sequences of {0, l}-valued variables, and on the respective no- 
tions of extendibility. Some necessary conditions for the extendibility of a partially 
exchangeable finite sequence have been studied in [21] and in ^U\. Finite Markov ex- 
changeable sequences have been analyzed in [211 US], but, as far as I know, no criterion 
for extendibility in the Markov exchangeable case has been given. In Section [21 we define 
a general framework in order to analyze this topic. In Sections [3] we analyze the par- 
tially exchangeable case. In particular we present two bijective transformations of the 
probabilities defining a binary partially exchangeable distribution (i.e. two alternative 
parameterizations). The first, introduced by de Finetti, allows us to establish necessary 
and sufficient conditions for extendibility developing the geometric approach presented 
in [5], [7] and [2] for the simply exchangeable case. The second parameterization, in 
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terms of generalized covariances, allows us to derive some simpler necessary conditions 
related to the central moments of the mixing distributions. In Section |4] we formulate 
analogous results for Markov exchangeable distributions. 

2 A general setting 

A sequence partially exchangeable in the sense of de Finetti is essentially a set of distinct 
exchangeable subsequences. The concept of partial exchangeability has been extended 
in various way, relating to ergodic theory and extreme points representation of a convex 
set, (see [2], [12], [TT], [TJ chap. 12]). In our case of discrete time processes taking 
values in a finite state space, we will refer to a simple formalization in terms of sufficient 
statistics borrowed from [S], (see also [M])- With this formalization, we can represent 
also simple exchangeability and partial exchangeability in the sense of de Finetti. 

Let (f2, JF, P) be the probability space on which all the r.v.s in the sequel will be 
defined. Consider a sequence of n r.v.s {Xi, . . . , Xn) each taking values in a finite set 
/. Consider a statistic T from /" into a finite set {ti, . . . ,tz}. We call the sequence, as 
well as its joint distribution, n-partially exchangeable with respect to T if: 

T(xi) -T(X2) ^P(xi) =P(X2) Vxi,X2e/" (1) 

That is, T induces a partition of /" into z equivalence classes and P attributes the 
same probability to the elements within the same class. So we can say that T is a 
minimal sufficient statistic for {Xi, . . . , X„) under P. Denote with [ti] the set {x e /" : 
r(x) ~ ti}; denote P(x G [ti]) as Wf. , and the probability of any specified sequence in 
[ti] as pt^. We have w — [ [ti] [ -pti where | [ti] | denotes the cardinality of the set 
[ti], and the distribution of {Xi, . . . , Xn) is completely defined by the z probabilities 
[w ti, ■ ■ ■ t^) subjected to J2i=i''^ti — 1- On the converse, any set of nonnegative 
values {w ti, . ■ ■ tW t^) having sum 1, defines a sequence n-partially exchangeable w.r.t. 
T. Consequently the space 

= |(wti, . . . ,wt_) : w ti > 0, i = I, ■ ■ ■ , z, t, = (2) 

which is the {z — l)-dimensional unitary simplex embedded in R^, represents all the 
distributions n-partially exchangeable w.r.t. T. Let /i[t.](x) = P(x | T(x) — ti) be 
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the conditional probability distribution on /" given T, assessing equal masses to all the 
sequences in the equivalence class [ti] and mass to the other sequences: 



1/\[U]\ ifxe[i,] 

otherwise 



Then the following stated in is plain: 

Theorem 2.1 ([8]). The set of all the distributions over I" partially exchangeable w.r.t. 
T is a simplex whose vertices are the extremal distributions h j^.j, i — I, . . . , z, and each 
partially exchangeable distribution is a unique mixture of those extremal distributions 
with mixing weights w ti, ■ ■ ■ ,w t, ■ 

The extremal distributions can be conceived as urn processes without replacement, 
and, depending on the properties of T, de Finetti's style theorems may be deduced by 
the convergence of the hypergeometric processes to the i.i.d. processes. 

3 Partially exchangeable binary sequences in the sense of de Finetti 

We say that {Xi, . . . ,X„) is partially exchangeable in the sense of de Finetti of order 
[ni, . . . ,ng), and we will denote it (rii, . . . ,ng)-DFPE, if it can be divided into g ex- 
changeable subsequences • ■ • , Xi^„. ), i — 1, . . . , Ui — n. Denote X]j=i ^jj' 
as Si. If the variables are {0, l}-valued, (5*1, . . . , Sg) is a sufficient statistic in the sense 
of (P). Denote P (x e /" : Si = ki, . . . , Sg ^ kg) as w ^^^''"l"/'' and the probability of 
any sequence consistent with {Si — ki, . . . ,Sg — kg) as P^."^ ^t""''- Then we have: 

(ni,...,ng) _ / '^1 \ ( "ff 1 ^("i:---."s) 



Pki,...,k„ V"5; 



An (rti, . . . , rtg)-DFPE distribution is defined by the (ni + 1) • • • {ug + 1) probabilities 
w fe"^' fe""'' defined for every g-tuple of nonnegative integers (ki, . . . , kg) such that < 
fci < for i = 1, . . . , 5, subjected to 

ni rig 

ESr^ (ni,...,rag) ^ 
■■• ^fci,...,fc/ = 1 

For what we have said in ([2]), the l""^ i"^ ^""^ | fc <n range in the unitary simplex 

1=1,. ..,g 

<>(ni + l)---(ng+l)- 



The exchangeability of each subsequence ■ • ■ , Xi^m) imphes the exchangeabihty 

of aU its subsets, and we can obtain all the probabilities of the kind ("I'-^-^^s) | ^^^^ ^ 

i=l,...,g 

rrii < Ui, from the |w | through the following easily proved formula: 



(mi,...,mg) 



EV-^ \h)\mi-lj \lgj\mg~lg) ( Tl J , . . . , K g ) ..X 

(ni\ '■' (ng\ '"fci,...,fcg 



Tii-mi+/i ng-mg+lg rki\rni~ki\ (kgWrig-kgX 

\hl Kmi- lil \lg)\mg~lgl {ni,...,ng) 

TO 

Vmi / 
(ki,...,kg) 



Denote in particular the probabilities w),_^ ' ^ " 'Wkx,...,kg- We have 



Wk^,....kg (5) 



for every subset (si, . . . , Sfc.) of ki labels in {1, . . . , n^}, i = 1, . . . , By (|4|) we have 



ni rig 



EV^ (*l)fci (*s)fca (ni,...,ng) 

where, from now on, {i)k — i(i ~ 1) ■ ■ ■ {i ~ k + I) for fc < i and (i)o = 1- 

To define the inverse map of ^ introduce the difference operator w.r.t. the i-th 
group: Ai (w ki.,...,kg) = w ki,...,k,+i,....kg - w ki,...,k„....kg- Then we have (see 0j) 



w 



= (i;) ••• Ar-''^---A-^-'^{wk.,...,kg) n) 



Where ui o,....o = 1- So the {w fcj^...^fcg} ki<ni suffice to completely define any (ni, n^)- 

i=l,...,g 

DFPE binary sequence, i.e. they constitute a parameterization of an {rii, . . . , ng)-DFPE 
binary distribution. 

By in a-ii ('^ij • ■ • ,n.g)~DFPE sequence each probability w kx,...,kg should satisfy 

Moreover, since by (O it is Ym^I^q ' ' ' Ym^I^o ^ i"^ fc""'' ~ o,...,o = 1, the ^ constitute 
necessary and sufficient conditions for a set {wfci,...,fcg} ki<ni with wo,...,o = 1 to define 

i=l,...,g 
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an (ni, . . . ,ng)-DFPE sequence. Then the {w ki....,kg} ki<ni range in the space 

i=l,...,g 

A„i,...,ng = \{wki....,kg) ki<ni , ^fc,; > : Satisfy ^\ 
3.1 Generalized covariances 

We introduce a generahzation of the usual concept of covariance defined as foUows: the 
covariance of order k among the variables Xi , . . . , is 

Cov[Xu . . . , Xfc] = E[{X^ - E[X{\) ■ ■ ■ {Xk - E[Xk])] (9) 

Under DFPE, these covariances depends only on the number of variables involved for 
each exchangeable subsequence. To simplify the notation, denote the value w ki,...,kg 
when ki — 1 and all other subscripts are zero as i.e. E[Xi^i\ = w{i). Then under 

DFPE any generalized covariance involving ki r.v.s of the i-th subsequence, i = 1, . . . , g 
is equal to 



Covk^,...^kg = E 



- w{l)) ■ ■ ■ {X,.k, - w{l)) ■ ■ ■ - w{g)) • ■ • (Xg^kg - w{g)) 



and the relation with the previous parameterization is 



COVk.,...,kg = E E Wk.-..,...^kg-.. 

^,=0 ^g=0 ^^3/ 

(10) 

Proof of pU]) ; For the sake of simplicity, but without loss of generality, set g—2. By 
expanding the product, 

- w{l)) ■ ■ ■ - w{l)) (X2,i - w{2)) ■ ■ ■ {X2M - 

results as the sum of (fci + l)(fc2 + 1) terms of the kind 

E E (-^(l))''"'(-^"(2))'""''Xi,/.,---XiA-^i...---^M. (11) 

hi<...<hi si<...<Sj 

where the first sum ranges over all the possible i-tuples {hi, . . . ,hi) of distinct labels 
in {l,...,ni} and consists of C^^) terms, the second of C'^) terms. Passing to the 
expectation, by ©, the term ^ results as w{l))'''^\- w{2))''^~^w so 
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that 



ki k2 

EE 



ki+k2~i-j 



□ 



One can prove that the inverse map, which is somewhat similar to the inverse of a 
binomial transform, is 

where Cow o....,o = 1 smd all the covariances having a single 1 and all zeros in the 
subscript are zero. So, a (ni, . . . , ng)-DFPE binary sequence is completely defined by the 
g probabilities w{l), . . . ,w{g) together with the generalized covariances {Cov ki,...,kg} 
defined for every gi-tuple {ki, . . . , kg) with ki < rii and such that X]i=i — 2- The 
space of the Gov ki,....kg is implicitly defined by A„j^^...^„g and (|10p and is not easily 
described. We can say that all the Cov ki,...,kg can be both positive or negative, and by 
are all null if, and only if, Xi, . . . , Xn are i.i.d. 

3.2 Extendibility 

For the sake of simplicity in this section we set g = 2, but all the results hold for a 
general g. 

For what we have said, we can represent any (ni, n2)-DFPE distribution as a point 
in the linear spaces '^(^ni+i){n2+i} ^-nd A„j^„2. Formulas ([7]) and ([6|) define the linear 
maps between the two spaces. Clearly these maps are one-one and onto and establish 
affine congruence of the two sets. The (ni + l)(n2 + 1) vertices of <^(„j+i)(„2+i) are the 
points having one coordinate equal to one and the others equal to zero and represent 
the extremal distributions of Theorem 12.11 ^ maps this vertices onto the vertices of 
■Ani,n2- III particular, the extremal distribution having tw^,"^^"^^ = 1 is represented in 
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A„i,„2 by the point XkiMi ni,«2 = ("^ luh) ii<ni, having coordinates 

i2<"2 



whenever li > ki for any i — 1,2 



(ni)ii(n2)i 



elsewhere 



The points {X kiM- ni.n2}ki<nn are affinely independent, then A„j^„2, which is their 

fc2<n2 

convex hull, is a (rii + l)(n2 + 1) — 1 dimensional convex polytope with (ni + l)(n2 + 1) 
vertices, i.e. a non-standard simplex. 

We say that a (ni,n2)-DFPE sequence is (at least) (ri, r2)-extendible, > n^, if 
it is the initial segment of a (ri,r2)-DFPE sequence. So the sequence, represented 
by the point w = {w i^j^) h<ni iii Ani.n2) is (ri, r2)-extendiblc if, and only if, there 
exist a point w* = {w Uim) ki<ri in ^ri,r2 such that its orthogonal projection over the 

k2 <r2 

coordinates of A„j^„2 coincide with w. That is, denote as A["V"^'' the projection of 
Ari,r2 over the coordinates of A„j.„2, and as A^"^;l".^),^ the analogous projection of 
A fci,fc2: ri,r2 ■ Then A["Vr^'' is exactly the subspace of A„j „2 representing the (rii,n2)- 
DFPE distributions which arc at least (ri, r2)-extendible and it results as the convex 
hull of the r2}fci<rr Moreover, we are going to see that none of this point is 

k2 <r2 

redundant with respect to the convex hull problem, that is they are exactly the vertices 

(ni,n2) 
ri,r2 

Theorem 3.1. 



of Ar.i,r2 



\ {ni,n2) _ ^1 ^1 \ (Tii,n2) _L ^ \ ("i'"2) 

ki,k2;ri,r2 ~ ki,k2 ; ri-l,r2 ~^ fei - l,fe2 ; r-i -l,r2 

_ r2 - ^2 , («i,n2) ^2 ,(ni,n2) 

fci ,fc2 ; ri ,r2 — 1 „ ^ fei ,fc2 — 1 ; ri ,r2 — 1 



(13) 



Proof. We have 



(ni,"2) _ (ni + l,n2) (ni + l,n2) ,n 

i^kiM ~l'ki,k2 ^Pki + IM 

_ (ni,n2 + l) , (ni,n2 + l) t-\ 

The point Afe^^fcjiri.ra represents the distribution having w [.'^^j!^^^ = 1. Any term p^"^^^^^^ 
appears in the right hand side of exactly one equation of the kind (fH|) and one of the 
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kind (fTS]) . Then, by ^ it is easily seen that if w\^^Q^ = 1, it is 

(n-l.rs) _ ?'l - kl (n-l.rs) _ ^ (ri.ra-l) _ ?'2 - ^2 (ri.ra-l) _ 

then the statement follows by ([1]). □ 

Proposition 3.1. Consider a polytope A and a set of points lying on distinct edges of 
A. Call A' their convex hull. Then those points are the vertices of A' . 

Proof. Say one of those point v lies on the edge e of A. Then it can only be represented 
as convex combinations of points in e, and no other points in A. But v is the only 
point of A' lying on e and obviously A' C A, then v cannot be represented as convex 
combinations of any other points in A' , and hence is a vertex. □ 



Theorem 3.2. a) The 



{^i"!fer;Lr2}fci<ri "'^e the vertices o/Af"V2"'^ 
b) Each pair of points in the right hand side of (jl3p constitute the vertices of an edge 
of their own space. 

Proof. A„j^„2 is a simplex, so each couple of its vertices identifies an edge. By p3p . 
the points ^kl^k^^-^m+i n, ^ri"+i!^n2 distinct edges of A„j^„2 and by Proposition 

13.11 are all vertices of ^l^^^^l^l^- Moreover, each couple of vertices of A^"Y"J^22 
kind >^i"]kTXi+i,n2^ ^i?+ii;«i+i,n2 adjacent edges of A„,^„, having the 

vertex Xki,k2 ;ni,n2 in common, and no other vertex of A^j^YT^ia ^ki,k2 ;ni,Ti2 in its 
representation (fT^ . So they identify an edge of To be precise, all the points 

^kl^k^^\i+i 712 ii^-ving fcl = or fcl = rti + 1 coincide with vertices of A„j_„2. However, 
as we have said, there are not three points having a common vertex of A„j^^„2 in their 
representation (fT5)) . so they are vertices of A^"Yi^ri2 well. In conclusion, a) and b) 
are valid for ri = ni + 1, and obviously also for r2 — n2 + I. It is easily seen that, if we 
suppose a) and b) hold for Af"Vr^\ then they also hold for A^"^'"^^^ and A^"'^^'"^^^, so 
the theorem is proved by induction. □ 

In conclusion, an (ni, rt2)-DFPE distribution, represented by a point w in A„j^„2, 
is at least (ri, r2)-extendible if, and only if, w is contained in Ar^Vr^', and is exactly 
(''I, ?'2)-extendible if a|,":!^'"^^^ and A^"'^^",^^;^ do not contain w. 

Note that, by virtue of (U we can map the extremal points of <(}(^j.i+i){r2+i) ^nd find 
the subspace of 0(ni+i)(n2+i) representing the (ni,n2)-DFPE distribution that are at 
least (ri, r2)-extendible. But the probabilities depend on ni and n2, so we 
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should obtain the vertices of the subspaces for each couple (ni,n2). On the converse, 
the probabilities w do not depend on the sequence size, and once we know the 
vertices of ^ri,r-2 we can obtain the vertices of A^^V^^' for every rii < ri, n2 < r2 
simply excluding certain coordinates. 

The points in A |^"q'^'' represents the n-exchangeable distributions that are at least r- 
extendible. The {n — l)-dimensional faces of an n-dimensional polytope are said facets. 
A polytope is said simplicial if all its facets are simplexes. Crisma in ;2| demonstrated 
that the A^.'q"-' are simplicial and their vertices satisfy Gale Evenness Condition (a 
combinatorial property characterizing the facets). As a consequence, we can easily 
determine if a point lies inside any A|,"q°''. Moreover, Crisma has been able to compute 
their volumes, determining in some sense the proportion of n-exchangeable sequences 
that are r-extendible. 

Unfortunately, the Ar",V"^'' are not simplicial polytopes, and we have not found an 
analytical way to determine their facets. Then, to determine if a point w lies inside a 
certain polytope Ar"Vr^'' we can use the following linear program: 

maximize z w — zq — j 
subject to z^X — Zq <0 

z'^W — Zq < 1 

where z G k("i+i)("2+i) g^j^j^ -.^ ^ jj^ ^pj-^g ig^^ inequality is artificially added so that 
the linear program has a bounded solution. The optimal value / is positive if and only 
if there exists an hyperplane {x £ R("i+i)("2+i) . _ separating the polytope 
A^^.Vr'^ and i.e. if and only if w lies outside of Af^^V2 

3.2.1 extendible case 

If all the g subsequences of a DFPE sequence are oo-extendible, there exists a probability 
measure v over the (^-dimensional hypercube [0,1]^ and a r.v. Q = (6{1), . . . ,6{g)^ 
distributed accordingly such that 

- - (i;) ■ ■ • {7) / ■ • n ^(^)'* (i - ^('O)"-"- ^^e) (i7) 



VA e U 



(ni,n2) 

fci ,k2 ; ri .r2 



}fci 



<r2 



(16) 
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So, the probabilities w ki....,kg are the ordinary mixed moments of the mixing measm'e 
v. El, 9{1)^^ ■■■9{g)^si . Let ^("i- -".?;) be the space of the mixed moments up to 
order (rii, . . . ,ng) of all the probability measures over [0, 1]^. For what we have said, 
{Ar"^.'.'.'r"''^}ri,...,rg IS a decreasing multisequence of polytopes and we have 



n ••• n A 



rx=ni rg=ng 



As far as I know there is no practical criterion to establish if a point of R''"i+^-' "'"s+^' 
lies inside Then we can check some simple necessary conditions for oo- 

extendibility using moments' inequalities. 

Formulas pOp and p2p link the ordinary mixed moments and the central mixed 
moments of a multivariate distribution (see e.g. [T9], equations (34.28) (34.29)), conse- 
quently we have: 



Gov 



(0(1) -£.[0(1)])'^ •••(%) -£.[%)])'' 



So, a simple necessary condition for a representation of the kind (fTT]) to hold is 



Cov2k^,...,2kg>0 Vfc, = 1,..., K/2J, i^l,...,g (18) 

To simplify the notation, denote as Cov{i,j) the covariance between a r.v. of the 

i-th group and one of the j~th group, i.e. the value Cov ki kg when ki = kj = 1 and 

all other subscripts are 0. Another simple necessary condition for (fT7|) to hold is that, in 

that case, for what we have said, |Coi;(i, j)}i<i<g is the Variance-Covariance matrix 

i<i<s 

of and hence must be nonnegative definite. 

Example. The (2, 2)-DFPE distribution defined by the following values of w )^_^ 



fcl\/c2 





1 


2 





3 


3 





16 


16 


1 


1 


3 







16 


16 


2 


1 





5 


16 


16 



by (O leads to the following values of w kiM- 
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and by pUj) we have 



ki\k2 





1 


2 





1 


1 


5 


2 


16 


1 


1 


23 


5 




2 


64 


16 


2 


3 


5 


5 


8 


16 


16 



Cov 2,0 = 1/8, Cow 0,2 = 1/16, Cov 2,2 - 1/32 



Cov 2,0 Cov 1^1 

Cov 1^1 Cov 0^2 



so (fT5j) zs satisfied. But Cov 2,0 Cov 0^2 — Covl^ — "iMe < 0? -^f^ ( 
zs noi nonnegative definite and the distribution is not (oo^ oo) -extendible. The linear 
program ()16p reveals that the point of A2 ^2 representing the distribution lies m A4 2 ? 

f 2 2 ^ ("2 2I 

but not m Ag 2 '^'''^ ■'^2 3 • Hence the distribution is exactly {A, 2) -extendible. Note 
that both the 2- exchangeable subsequences identified respectively by (w 1,0, 2,0) OL^d by 
(ui 0,1, Wo, 2), are 00 -extendible. 



4 Markov exchangeability 

Consider an /-valued sequence (xi, . . . ,a;„). Define its transition counts ui^j for all i, j 
in / as 

n-l 

^ l(i,j)(a;fe,Xfe+i) 



fe=i 



and arrange them in a matrix N ~ Then, the distribution of {Xi, . . . ,X„) 

is Markov exchangeable (hereafter ME or n-ME if we need to highlight the number of 
variables) when the sufficient statistic T in ([T]) is the value of the first step xi , together 
with the transition count matrix N . Introduce the number of transitions exiting from 
i: n'l — ^^"^ ^he number of transitions entering in i: — X^je/ " 

Proposition 4.1. Consider an I -valued sequence [xi, . . . ,a;„). Then, it is xi — Xn if, 
and only if 

n+ = 'ii'El (19) 

while it is xi ^ Xn if, and only if 

1 

- < + 1 (20) 
'^t = '^7 fori^xi^Xn 
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Moreover, an integer valued matrix N — {?^ } is a consistent transition count matrix 
if, and only if, it is irreducible and one between (|19|) and pO|) is valid. 



Proof. Consider H — {(a;i, 2:2), . . . , (x„_i, a;„)} and let J be the set of the distinct 
states (J C /) visited by (a;i, . . . We can think to N as the adjacency matrix of 

the directed graph G — (J, H). But G is Eulerian by construction, and the result follows 
immediately. □ 

Denote as [xi,N] the set of all the /-valued n-tuples starting in xi and having a 
transition count N. Denote P(x G [xi, N]) as w xi,N, and the probability of having any 
specified element of [xi, N] as p xi,N ■ Denote the set of all the distinct transition comit 
matrices of all the /-valued n-tuples starting in xi as $(a;i, n). For what we have said, 
an /-valued n-ME distribution is completely defined by the probabilities w xi.N for N 
ranging in $(a;i, n) and xi ranging in / subjected to 

Wj^i.jv^l and ^ w xi,N = P{Xi ^ xi) VxiG/ 

xiel N£<^(xi,n) N£<^(xi,n) 

The cardinality of [xi,iV] was first found by Whittle in [22j. Define the matrix B = 

-n.i,j/nj' for i ^ j 
1 — ni^i/n'l for i ~ j 



By (frnj) and (^01 , if we know the starting state and the transition counts of a sequence, 
we also know its ending state. 

Theorem 4.1 (|22j). The number of sequences in [xi,N] is 

det (/?,„, ^ 



where x„ is uniquely determined by xi and N , and where Bx„^x„ is the matrix obtained 
by B removing the Xn-th row and the Xn^th column. 

Then it is Wxi,N = det{Bx„,xJ t-t ' ' 1 P^i-N- 

We say that an /-valued process X = {Xn}neN is ME if {Xi, . . . , X„) is ME for 
every n. In it is demonstrated that a recurrent process {Xi = X„ i.o.) is ME if, and 
only if, its law is a mixture of Markov Chains. That is, let V be the space of all the 
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stochastic matrices Q — {0 on I x I. Then there exists, and is unique, a mixing 
measure v on the Borel sets oi I x V such that 



Let ri{k) be the step of the process X at which the state i occurs for the k~th time. Let 
Vi(k) be the fc-th successor of the state i, i.e. the variable immediately subsequent the 



is a mixture of Markov Chains. This actually occurs if all the states in / are recurrent, 
but Lemma 5 in [13 assures that a recurrent ME process is strongly recurrent, then all 
the states are recurrent and the two characterizations coincide. 

Zaman in [211 [H] demonstrated that finite Markov exchangeability does not coincide 
with finite exchangeability of the {Vi{k)}f.^^ i Q I. In fact, given xi and N, 

some of the transitions in {Xi, . . . , Xn) should necessarily occur as last. Then, the 
subsequences {Vi{k)}k are invariant only under permutations that do not alter those 
forced transitions. Zaman described the extremal n-ME distributions as particular urn 
processes without replacement where some balls should necessarily be drawn as last, 
but the characterization of the mixture of Markov Chains cannot be derived through a 
passage to the limit without adding some restrictions. 

4.1 Markov exchangeable binary sequences 

The proofs of the theorems (|4.2p . (|4.4p and (|4.3p of this section are in appendix. 
If / = {0, 1} we deal with 2x2 transition count matrices of the kind: 



and it is ng.o + o,i = n-o Siiid ?ii,o + "^i,i = ■ The term det{Bj;^,x„) in Theorem 14. II 
simply is n i,o/nf if Xn = 0, and rio^i/riQ if a;„ = 1. So we have 




n-l 



A:-th occurrence of i {Vi{k) — Xr.(fe)+i). The hypothesis of de Finetti was that, if all the 
subsequences {Vi{k)}f,^^ for i E I, are exchangeable and cxD-extendible, then X 





(„"o,o)("ni.l') P-i'^ 



if {xi,N) imply Xn = 
if (xi, TV) imply a;„ = 1 



(21) 
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We can consider separately the sequences depending on the initial state. From now 
on, we fix P{Xi = 0) = 1 and hence we will consider only the sequences starting 
with and the probabilities {w o,N}Ne<s>(o,n)j and {po,A'}A'G$(o,n)- We wiU also use the 
self-explaining notation po ( n"\o n"'} ) and w o {n°]o n°]l) when we need to display the 
number of transitions. 

Unlike the DFPE case, the number of probabilities defining an n-ME distribution is 
not so evident. We have to count the possible different transition count matrices for 
each fixed starting state. From (fT^ and (PU]) two cases are possible when Xi = 0, and 
we define 

$i(0,n) = {N e $(0,n) : no,i = ni,o} 
3>2(0,7i) = {A^ G $(0,n) : no4=ni,o + l} 

such that $i(0, n) U $2(0,?!) = $(0,7i). Call the transition count matrices of $i(0, n) 
matrices of the first kind, and those of $2(0, n) of the second kind. The following 
theorem corrects the assertion |$(0, n)\ — 1 + ("2 ^) stated in a different form in page 
239] and reported in [IS] 

Theorem 4.2. 

I*(0,n)| = l+Q 

For symmetry reasons the same result is valid for the sequences starting in 1. 
Now we state a couple of equations we will use in the following. For any n and N we 
have 

po,iv -Po("?;° -po rr.^'n'tD+Poi'^.Z ""l\l') if ^ e *i(0'") (22) 

/"o.o "0,1 \ / "0,0 "0,1 \ . /"o,o "0,1 \ -t AT ^ ^ rr, \ /'oo^ 

PO,N = Po[ni\o ni\i) =Po U 1,0 + 1 "1,1 j +P0 Ui,o "i,i + l ) N £ $2(0, n) (23) 

The first k steps {Xi,...,Xk), k < n, oi an n-ME sequence are fc-ME, and we 
can obtain all the probabilities {po,K}Ke^(o,k} from the {po,A'}jve*(o.n)- Let K = 
( fci 1 1 ) ^f'^ transition count matrix up to step fc of a sequence starting in 0, and 
let fc 0,0 + 0.1 = fco^ a-iid fc i,o + fc i,i = fci^- Then 
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Theorem 4.3. 



E('^0,o)fco,o('^0,l)fco,i ('^l,l)fci,i('^l,0 - l)/ci.o 
f^^+y- (n+ - I) ^ """'^^ 

(no,o)feo,o('^0,l - l)feo,i ("l.l)fci,i("-l,o)fci,o 
Are*2(o,n) "^"o ^)k+ )k+ 

where the sums should be restricted over those matrices N in <&(0, n) having riij > 
k ij , for all i, j in {0, 1}. Consider the probability p o ( o J ) of having the sequence of 
a + b + 2 steps starting in with a transitions (0, 0), a single transition (0, 1) and ending 
with b transitions (1, 1), and denote it wo,a,b- By the above theorem we have 



sr^ (?io,o)a (7^o.l - 1) ("-i.i)b 



(24) 



We set wo,ri-i,o =Po ("o^ o)- 

Define the operators Aq and Ai such that: 

Ao (W0,a,b) = Wo,a+l,6 - Wo,a,b and Ai (Wo.a.b) = Wo,a,6+l - '!«0,a,b 

Then we have 
Theorem 4.4. 

hi an n-ME sequence the probabilities {wo,a,f)} are well defined for every couple of 
nonnegative integers (a, b) having sum not greater than n ~ 2, together with the case 
'M^o,n-i,o- Denote as £„ the set of couples (a, 6) such defined together with the couple 
(n — 1,0). Theorem 14.41 assures that the probabilities {w o,a.b}c„ suffice to completely 
define an n-ME sequence starting in 0. It is easily seen that l^nl = (2) + 1 as we would 
expect. 
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4.2 Extendibility 

Unlike the DFPE case, in a ME sequence it is meaningless to consider separately the 
extendibility of the two subsequences {Vo{k)}k and {Vi{k)}k. Then we say that an 
n-ME sequence {Xi, . . . , X„) is r-extendiblc if there exist {Xn+i, ■ ■ ■ , Xr) such that 
{Xi,...,Xr) is r-ME. 

The probabilities w o,a,ti allow us to study the extendibility of a ME sequence in a 
geometric approach analogous to that of Section 13.21 

The space of the probabilities {w o,a,b}cn of the n-ME sequences starting in 
(call it r„) is implicitly defined by Theorem 14.41 That is, we have that every wo,a,b 
should satisfy 

(-l)=+'^-iAE;~iAf(u;o,a,6)>0 V(c,d) : (Sg)G$(0,n) (25) 
and we can write 

> 0, (HH) is satisfied j 

Theorem 14.41 and establish affine congruence between the unitary (2) -dimensional 
simplex which is the space of the probabilities {w o,Jv}AfG*(o,n)i and r„, which 

consequently is a (2)~dimensional (non standard) simplex. The vertices of rep- 
resent the extremal distributions of Theorem 12.11 Equation maps them to the 
vertices of r„. We will denote as the vertex of r„ corresponding to the extremal 
distribution having w o,n — 1- 

An rt-ME sequence starting in represented in r„ by the point {w Q_a.b)c„ is r- 
extendible if, and only if, there exist probabilities {w o,a,h} with (?i — 2) < (a+b) < {r — 2) 
together with wo.r-i.Oj such that (wo.a.b)^^ hes in F^. Let Tr^^ be the orthogonal 
projection of Tr over the coordinates of r„, and let j^j^^ be the analogous projection of 
7fl. Then pf"'' represents the n-ME sequences that are (at least) r-extendible and is 
the convex hull of the {77j'''}j?,e*(o,r)- 

By ([2T|l , ((22|l , ([23|) , (p4| , and with passages similar to those of the proof of Theorem 
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13.11 one can prove the following is valid for any r > n: 





(n) ( ro,o ro,l \ 
V f 1,0 1,1 / 



(n) ( ro,o ro,l \ 
\r 1,0 '•1,1 ) 




^0,0 I'D, 1-1 
ri,o J-i.i 



''0,0 '"0,1 

r 1,0-1 r 1.1 



) 



) 



+ 



+ 




'"0,0 I"0,1 

'" 1,0 r 1,1-1 



'"0,0-1 ro,i 

^1,0 '"1,1 



) ifi?e$i(0,r) 



) ifi?e$2(0,r) 



(26) 



As a consequence, F^"^^ is embedded in T^' and {rr^}r is a nested sequence of convex 
polytopes. To verify whether a point representing a distribution lies inside a certain 
polytope, and establish its extendibility, we can use a linear program analogous to p6p . 

We have computationally calculated the volume of some of the polytopes Tr ■ We 
consider the ratio of the volume of r^"' to the volume of r„ as an index of the proportion 
of n-ME distribution that are r-extendible, as has been done in 2 and 23J for the 
exchangeable case, and we report some values in Table [T] By one can see that, 

in) (n) 

unlike the DFPE case, not all the points 7)^, are vertices of Tr as some of them are 
redundant. A strange consequence is that Vf^ = Ta for any r, so in Table [T] we start 
with n = 4. 



Table 1: Values of VoliTr') j Vol{Yn) for different values of n and r. The entries 
relative to n = 6, 7, 8 with r = 10 are missing since it seems computationally intractable 

in) 

to find the relative volume of Fj. . 
4.2.1 00 extendible case 

An 00-extendible n-ME sequence is not necessarily the initial segment of a mixture of 
Markov Chains. As pointed out in ^9], an infinite ME sequence starting in is a mixture 
of two kinds of processes: recurrent Markov Chains and processes that deterministically 
begin with a streak of zeros, make a single (0, 1) transition and end with all ones. But 
if, as n — > 00, both and n\ go to infinity, there exists a unique mixing measure v 



n \ r 



5 6 7 8 9 10 



4 
5 
6 
7 
8 
9 



0.75 0.6667 0.6024 0.5504 0.5105 0.4778 
0.4445 0.2860 0.2018 0.1454 0.1091 



0.1929 0.0738 0.0336 
0.0625 0.0111 
0.0146 



0.0025 
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over [0, 1]^ and a couple {9 o,0) 9 i,i) such that, conditionaUy on Xi = 



JO 



Denote the indicator function of the event {the fc-th successor of i is j} as Yij{k): 
lj{yi{ky) = Yij{k) for all i,j in {0, 1}. Then we can write 



Wo,a,b = E 



(1 - Xi) • Fo,o(l) • • • Yo,o{a) ■ (l - Yo,o{a + 1)) • • • • Fi,i(6) 



When we consider sequences starting both in and 1, we introduce the probabilities 
w iM,b, defined as the probabilities of having the sequence starting in 1 with b transitions 
(1, 1) a single transition (1, 0), and ending with a transitions (0, 0). Then we have: 

w i,a,b = e[x,- yi,i(l) • • • Fi,i(6) • (l - yi,i(6 + 1)) • Fo,o(l) • • • Yoflia) 

In a mixture of Markov Chains it is 

W0,a,b = [(1 - Xi) (^o,o)"(^i,i)''] - E^, [(1 - Xi) (^o,o)"+M^i,i)''] 
wi,a,b = E, [Xi {eoflT{Oi,if]-E, [Xr {6 ofiT{e i^if+^] 

So, unlike the DFPE case, wc do not have the mixed moments of the mixing distribution, 
but those involved differences. It is easily seen that it is not possible to single out 
the mixed moments from the probabilities wo.a.b and wi.a.b- However, let N be the 
transition count matrix of (Xi, . . . , X„) intended as a r.v. Then, if the ME distribution 
is such that Xi and N are independent, we can obtain them. Define 

ma,b = E [yo,o(l) ■ • -^coW • n,i(l) ■ • -n.iW] 
and let P(Xi = i) = qi. Under independence of Xi and N we have 

WO,a,b ^ . Pliob) 

— TOa.fc — Wa+l,b and —TTlob 

Then we have mi,(, = mo,6 — {w o,o,b/Qo), and in general, by recurrence 

WO,a-l,b 



ma,b = ma-l,b 
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So an n-ME distribution such that Xi and N are independent is defined by the quantities 
ma,b, for every couple (a, b) such that a + 6 < n — 1. In a mixture of Markov Chains 
it is niafi = ^ 1 i] ^^"^ '^^^ formulate generalized covariances as in ^ and 

(fTU)) and state simple necessary conditions for cxo-extendibility as in the DFPE case. 

5 Concluding remarks 

For what we have said, either for exchangeable, DFPE and ME cases, the oo-extendible 
sequences are a particular subset of all the sequences of a fixed length. Then, in the 
inferential analysis of binary data, one can look for distributions which do not need the 
assumption of oo-extendibility as an alternative to the mixtures of i.i.d and mixtures 
of Markov Chains processes. So, a preliminary analysis of the extendibility of the data 
at hand (i.e. of their empirical distribution) can give some evidences against a mixture 
model, and the present paper give the tools for this purpose. 

Gupta in [15|,ll6j looked for an extension of the Hausdorff's moment problem for dis- 
tributions over the simplex, and implicitly found the necessary and sufficient conditions 
for the extendibility of an exchangeable finite sequence taking values in a finite state 
space, with the same geometric interpretation we have given. Combining his results 
with those of Section [3] one can easily find the conditions for the extendibility of DFPE 
sequences when the variables assume more than two values. It seems hard to find an 
analogous extension for the ME case. 

Appendix A. Proof of Theorem 14.21 

We first find |<I'i(0, n)\. In a sequence of length n we have n — 1 transitions. For every 
fixed value for ni.o = ?io.i equal to k, say, the couple i.i, f^ o,o) can assume all the 
possible values such that {n i^i + no.o) = n — 1 — 2fc, whose number is (n — 2k). The 
possible values for A; = n i,o = n o,i range in 0, 1, ... , \_{n — 1)/2J , where [(n — 1)/2J is 
the integer part of (n — l)/2. In the special case rti,o = '^■0,1 = we have only one 
matrix ( "g ^ q ) . So we have 

L(n-1)/2J 

|$i(0,n)| = l+ (^-2fc) 

k=l 

Now consider the following two arguments: 
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• All the sequences consistent with a matrix in $2(0,71) start in and end in 1. If 
we add a transition (1,0) at the end of any such sequence, its transition count 
matrix belong to <f>i(0,n+ 1). 

• If we reduce of one the number of transitions (1, 0) in a matrix of $i(0, n + 1), we 
obtain a matrix of $2(0,71). 

Consequently, each matrix of the second kind is constructible by one of the first kind of 
a step longer, as long as n 1^0 is not null. Then we have to exclude the matrix having 
n- 1,0 = 0,1 = and it is: 

L«/2J 

|$2(0,n)| = |$i(0,7i + l)|-l= J2 (" + l-2fc) 

fc=i 

Clearly it is |$(0,n)| = |$i(0,n)| + |$2(0,n)|, that is 

L(«-1)/2J Ln/2J n-1 

|$(0,7i)| = l+ (n-2fc)+ ^ (n-2fc + l) = l + ^(n-fc) = 1 + 

k=l k=l k=l 

Appendix B. Proof of Theorem 14.41 

wo,a,b is the probability Poiol) having a sequence starting in and ending in 1. 
Then by ^ we have 

WO,a,b^Po{ol) =Po(? b)+Po(ob|l) 
= -Po(? b) +Wo^a,b+l 

Then it follows that 

Poitl)^ -^li^ 0.a,b) (27) 

so, we can derive the probability of having any sequence starting in and consistent 
with the transition count matrix ( x J )■ These sequences end in 0, so by (j22p we have 

Po(??)=Po(?^)-poC'r (28) 

We have just demonstrated that all the terms on the right hand side of can be 
derived from ([27|) . and it is po ( 1 b ) — ^0 (Ai {w o^a,b))- So, we can derive the probabil- 
ity of any sequence starting in and consistent with the transition count matrix (iD- 
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For an n-ME sequence starting in 0, it is always rio,i — ".1,0 or no,i = n 1^0 + 1- So, 
repeating the previous passages, by recurrence, we obtain: 



^ _ ^ / "0,0 "0,1 \ _ 



/ \ " 1,0-1 , , 

-Ai(^AooAij (i«o,no.o,«i,i) ifiVe*i(0,n) 

(^AooAij (u;o,«o.o,"i.i) ifA^e$2(0,n) 

which is equivalent to Theorem [ 



Appendix C. Proof of Theorem 14.31 

Let K be the transition count matrix of the first fc steps of the sequence. The number 
of sequences (xi, . . . e {0, 1}", with x\ = 0, such that K = (^fc"'" fc^'J ^ ^nd -/V = 
(ni'o riJ i) is equal to the number of sequences consistent with the transition count 
matrix (^"■°"^-'' T^^l^'A that is 



0— fei,o "1,1— fci, 



( "i[-feo+ if a;„ = 

\n 0,0-fc 0,0' 1.1-K 1,1' 
\n 0,0-fe 0,0' 1,1-K 1,1' 



But, as we have said, since we have fixed x\ = 0, it is a;„ = if is of the first kind, 
and a;„ = 1 if is of the second kind. Then it is 



A''e<I>i(0,n) 



,1 



E 

JVe'I>2(0,n) 



"0,0 - fco,o / V"i,i ^ ^1,1 



Finally the theorem follows by (|2ip and the fact that 

(no,i-feL) ("0,o)fco,o("(l" - "0,o)fc+_fc„ 



( < ) ("J) 
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